LWS-Det: Layer-Wise Search for 1-bit Detectors
169
FIGURE 6.11
An illustration of binarization error in the 3-dimension space. (a) The intersection angle θ
of real-valued weight w and activation a is significant. (b) After binarization ( ˆw, ˆa) based
on sign function, the intersection angle ˆθ = 0 . (c) ˆθ = 0 based on XNOR-Net binarization.
(d) Ideal binarization via angular and amplitude error minimization.
illustrated in Fig. 6.10. As depicted above, the main learning objective (layer-wise binariza-
tion error) is defined as
E =
N
i=1
∥ai−1 ⊗wi −ai−1 ⊙wi ◦αi∥2
2,
(6.69)
where N is the number of binarized layers. We then optimize E layer-wise as
argmin
wi,αi
Ei(wi, αi; wi, ai−1, ai−1),
∀i ∈[1, N].
(6.70)
In LWS-Det, we learn Equ. 6.70 by decoupling it into angular loss and amplitude loss, where
we optimize the angular loss by differentiable binarization search (DBS) and the amplitude
loss by learning the scale factor.
6.4.3
Differentiable Binarization Search for the 1-Bit Weight
We formulate the binarization task as a differentiable search problem. Considering that the
1-bit weight is closely related to the angular, as shown in Fig. 6.11, we define an angular
loss to supervise our search process as
LAng
i
= ∥cosθi −cosθi∥2
2
= ∥
ai−1 ⊗wi
∥ai−1∥2∥wi∥2
−
ai−1 ⊙wi
∥ˆai−1∥2∥wi∥2
∥2
2.
(6.71)
For the learning process of the i-th layer, the objective is formulated as
argmin
wi
LAng
i
(wi; ai, wi, ai).
(6.72)